Add support for running vLLM #799

amaslenn · 2026-02-10T12:39:41Z

Summary

Two modes are supported at the moment, single node only:

Disaggregated run.
Non-disaggregated run.

Test Plan

CI (extended)
Manual runs.

Additional Notes

–

coderabbitai · 2026-02-10T12:40:01Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new vLLM workload: documentation, a vllm workload package (models, test definition, bench args), Slurm command-generation for aggregated/disaggregated runs, report parsing/renderer, registration wiring, reference SBATCH scripts, and tests for command generation and result parsing.

Changes

Cohort / File(s)	Summary
Documentation `doc/workloads/index.rst`, `doc/workloads/vllm.rst`	Added vLLM entry to workloads index and a new `vllm.rst` describing usage, config, examples, proxy_script behavior, and API models.
Workload package (exports & models) `src/cloudai/workloads/vllm/__init__.py`, `src/cloudai/workloads/vllm/vllm.py`	New vllm package exports; defines `VllmArgs`, `VllmCmdArgs`, `VllmBenchCmdArgs`, `VllmTestDefinition`, constants (`vllm-serve.log`, `vllm-bench.log`, `vllm-bench.json`), installable resolution, and run-success parsing.
Slurm command generation `src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`	Adds `VllmSlurmCommandGenStrategy` with container mounts, image resolution, GPU selection (prefill/decode), serve/proxy/bench command builders, health-check generator, and aggregated/disaggregated SRUN script emitters.
Report generation `src/cloudai/workloads/vllm/report_generation_strategy.py`	Adds `VLLMBenchReport` model, cached JSON parser, and `VLLMBenchReportGenerationStrategy` to detect results and render a Rich table.
Registration wiring `src/cloudai/registration.py`	Registers `VllmTestDefinition` as `"vllm"`, maps `(SlurmSystem, VllmTestDefinition)` → `VllmSlurmCommandGenStrategy`, and registers `VLLMBenchReportGenerationStrategy` for the test definition.
Reference SBATCH scripts `tests/ref_data/vllm.sbatch`, `tests/ref_data/vllm-disagg.sbatch`	Added example SLURM batch scripts demonstrating aggregated and disaggregated vLLM runs with health checks, proxy startup, PID/cleanup, and benchmark invocation.
Tests (unit, strategy, acceptance, init) `tests/job_status_retrieval_strategy/...`, `tests/slurm_command_gen_strategy/...`, `tests/test_acceptance.py`, `tests/test_init.py`, `tests/test_test_scenario.py`	Added tests for bench-log parsing/job-success, extensive Slurm command-gen strategy behavior (GPU handling, aggregated/disaggregated flows, serve/bench/proxy commands), acceptance scenarios for `vllm` and `vllm-disagg`, and updated registry/report counts.
Test reference data `tests/ref_data/*`	Added SBATCH reference files used by tests for aggregated and disaggregated vLLM scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

srivatsankrishnan

Poem

🐰 I hop through logs and mount the drive,
I stitch the GPUs so tests come alive,
Ports hum, proxies shuttle data tight,
Benchmarks wake and dance into the night,
A cheerful rabbit thumbs a vLLM byte.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add support for running vLLM' is concise, clear, and directly summarizes the main change—introducing vLLM workload support to the cloudai framework.
Description check	✅ Passed	The description is related to the changeset, outlining the two supported modes (disaggregated and non-disaggregated) and the test plan, which aligns with the implementation changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch am/vllm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/test_test_scenario.py (1)
474-496: ⚠️ Potential issue | 🟠 Major

VllmTestDefinition is counted in reports_map (17) but not verified in test_custom_reporters.

The registry size was bumped to 17 to account for the new vLLM reporter, but the parametrized test_custom_reporters list (Lines 480–496) still only has 16 entries and does not include (VllmTestDefinition, {VLLMBenchReportGenerationStrategy}). Add it to ensure the mapping is tested.
💚 Proposed fix
             (NixlPerftestTestDefinition, {NIXLKVBenchDummyReport}),
             (AiconfiguratorTestDefinition, {AiconfiguratorReportGenerationStrategy}),
+            (VllmTestDefinition, {VLLMBenchReportGenerationStrategy}),
         ],
(Along with the corresponding imports at the top of the file.)
tests/test_init.py (1)
2-2: ⚠️ Potential issue | 🟡 Minor

Fix copyright year to satisfy CI.

The pipeline reports: "Copyright header year is invalid. Expected year range ending 2026 but found 2025."
-# Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

🤖 Fix all issues with AI agents

In `@doc/workloads/vllm.rst`:
- Around line 86-94: The TOML snippet visually implies `[extra_env_vars]` is
nested under `[cmd_args.prefill]`; update the example by inserting a clear
separation (e.g., a blank line and/or a comment) between `[cmd_args.prefill]`
and `[extra_env_vars]`, or expand to show the surrounding test-definition
context so readers see that `[extra_env_vars]` is a sibling of `[cmd_args]`;
ensure the change references the existing section names `[cmd_args.prefill]` and
`[extra_env_vars]` so it’s obvious they are separate blocks.
- Line 81: Change the grammatical error in the sentence about enabling
disaggregation: replace "one need to set ``prefill`` configuration" with "one
needs to set the ``prefill`` configuration" (update the sentence in
doc/workloads/vllm.rst that references the ``prefill`` configuration).

In `@src/cloudai/registration.py`:
- Line 260: The test registration currently adds the VllmTestDefinition under
the lowercase key "vllm", causing lookups for "Vllm" to fail; change the
registration call in Registry().add_test_definition(...) to use the PascalCase
key "Vllm" (i.e., Registry().add_test_definition("Vllm", VllmTestDefinition)) so
test_template_name lookups match existing usages of "Vllm".

In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 1-11: Add the standard SPDX and copyright header block at the very
top of this module (above all imports) to match the other new files in this PR:
include the lines starting with "# SPDX-FileCopyrightText: NVIDIA CORPORATION &
AFFILIATES" and the appropriate license identifier line (e.g., "#
SPDX-License-Identifier: Apache-2.0" or the license used in the repo). Ensure
the header appears before the existing imports and does not modify symbols such
as ReportGenerationStrategy or VLLM_BENCH_JSON_FILE so the rest of the file
(imports, BaseModel/ConfigDict usage, and references to VLLM_BENCH_JSON_FILE)
remain unchanged.
- Around line 63-64: Protect against division by zero when computing the percent
in the table.add_row call by checking results.num_prompts before performing
results.completed / results.num_prompts; if results.num_prompts is 0, substitute
a safe value or placeholder (e.g., "0.00% (0 of 0)" or "N/A") instead of
dividing. Update the code path around table.add_row in
report_generation_strategy.py (the block that formats f"{results.completed /
results.num_prompts * 100:.2f}% ({results.completed} of {results.num_prompts})")
to conditionally compute the percent only when results.num_prompts > 0 and
otherwise insert the chosen fallback string. Ensure the change uses the existing
results.completed and results.num_prompts symbols so callers and tests continue
to reference the same fields.
- Around line 31-40: The parse_vllm_bench_output function currently calls
json.load and VLLMBenchReport.model_validate without handling parsing/validation
errors; wrap the json.load and model_validate calls in a try/except that catches
json.JSONDecodeError and pydantic.ValidationError (and optionally generic
Exception) and returns None on failure so corrupt or unexpected files don't
crash report generation; keep the function signature and cache decorator
unchanged and reference both parse_vllm_bench_output and
VLLMBenchReport.model_validate when implementing the try/except.

In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 175-183: Both _gen_aggregated_script and _gen_disaggregated_script
declare the parameter output_path without a type; annotate output_path: Path to
match the codebase convention (e.g., def _gen_aggregated_script(...,
output_path: Path) -> str) and do the same for _gen_disaggregated_script; also
ensure Path is imported (from pathlib import Path) and update any references to
VllmCmdArgs or return typing if needed to keep type-checking happy.
- Around line 240-252: The decode-step exports (export
CUDA_VISIBLE_DEVICES="{decode_gpus}" and export
VLLM_NIXL_SIDE_CHANNEL_PORT=$DECODE_NIXL_PORT) leak into subsequent proxy and
benchmark srun steps; either scope the decode exports in a subshell around the
srun invocation (so the environment resets after the command), or add explicit
unsets (unset CUDA_VISIBLE_DEVICES; unset VLLM_NIXL_SIDE_CHANNEL_PORT)
immediately after DECODE_PID is set; locate the block around
PREFILL_PID/DECODE_PID and modify the decode section that constructs
decode_cmd/vllm-decode.log to apply one of these fixes.
- Around line 30-32: The methods image_path, get_vllm_serve_commands,
get_proxy_command, get_vllm_bench_command, generate_wait_for_health_function,
and _gen_srun_command duplicate casting of self.test_run.test to
VllmTestDefinition; replace those casts with the existing self.tdef property to
remove duplication and ensure consistent access to the VllmTestDefinition
instance (update image_path to return str(self.tdef.docker_image.installed_path)
and similarly use self.tdef inside each of the listed methods).

In `@src/cloudai/workloads/vllm/vllm.py`:
- Around line 106-115: The log parsing in was_run_successful is fragile: when
reading lines after the "============ Serving Benchmark Result ============"
marker the code assumes line.split()[2] exists and is an integer
(num_successful_requests); change this to defensively parse the "Successful
requests:" line in the was_run_successful function by first verifying the split
has at least three parts (or use a regex to extract the digits after "Successful
requests:"), and wrap the int conversion in a try/except to handle ValueError;
if parsing fails, skip that line and continue, and only return
JobStatusResult(is_successful=True) when a valid positive integer is obtained.
- Around line 36-42: The serve_args property currently stringifies all values
from self.model_dump(exclude={"gpu_ids"}) which produces invalid CLI forms for
None, bool, and list types; update serve_args to skip None values, treat
booleans as flags (include "--key" when True and omit when False), and serialize
list values into the expected vLLM CLI form (e.g., join items with commas or
repeat the flag as appropriate for your CLI) while still converting other
primitive types to strings and preserving the key-to-flag conversion
(k.replace('_','-')) so that serve_args emits valid arguments for vllm serve.
- Around line 79-80: The private attributes _docker_image and _hf_model should
be declared using Pydantic v2's PrivateAttr to be stable: import PrivateAttr
from pydantic and replace the bare annotated defaults (`_docker_image:
DockerImage | None = None`, `_hf_model: HFModel | None = None`) with
PrivateAttr(default=None) assignments (e.g., set `_docker_image =
PrivateAttr(default=None)` and `_hf_model = PrivateAttr(default=None)`) so the
vllm model class uses the supported private-attribute API.

In
`@tests/job_status_retrieval_strategy/test_vllm_job_status_retrieval_strategy.py`:
- Line 64: Rename the test method currently named test_no_succesfull_requests to
test_no_successful_requests (fix the typo) in the test class so it follows
naming conventions; update any references or usages (e.g., test discovery or
calls) to the old name, and ensure the test decorator or fixtures (like base_tr:
TestRun) remain unchanged—look for the function definition def
test_no_succesfull_requests(...) and change its identifier to def
test_no_successful_requests(...).

In `@tests/ref_data/vllm-disagg.sbatch`:
- Around line 48-59: The decode srun exports CUDA_VISIBLE_DEVICES and
VLLM_NIXL_SIDE_CHANNEL_PORT into the environment (see the second vllm serve
launch and PREFILL_PID usage), and because later srun calls use --export=ALL
those values leak into proxy/benchmark steps; after launching the decode srun
(the vllm serve --port 8200 --kv-transfer-config ... & block that sets
PREFILL_PID/DECODE), explicitly unset or reset CUDA_VISIBLE_DEVICES and
VLLM_NIXL_SIDE_CHANNEL_PORT (or set them to empty/default) so subsequent srun
commands don’t inherit them, ensuring only the intended vllm processes see those
vars.

In `@tests/test_test_scenario.py`:
- Around line 474-475: Update the copyright header in
tests/test_test_scenario.py to end with 2026 (change the year range on the
header line from 2024-2025 to 2024-2026); open the file, edit the top-of-file
copyright comment (near the file header, not in the test body or the
test_default_reporters_size function), save and run the CI check to confirm the
header now matches 2024-2026.

doc/workloads/vllm.rst

src/cloudai/registration.py

src/cloudai/workloads/vllm/report_generation_strategy.py

src/cloudai/workloads/vllm/vllm.py

tests/job_status_retrieval_strategy/test_vllm_job_status_retrieval_strategy.py

tests/ref_data/vllm-disagg.sbatch

tests/test_test_scenario.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_test_scenario.py (1)
474-499: ⚠️ Potential issue | 🟡 Minor

Missing VllmTestDefinition entry in test_custom_reporters.

The count on Line 475 was bumped to 17 to account for the new vLLM reporter, but the parametrized test_custom_reporters (lines 477–496) still only covers 16 entries — it does not include a (VllmTestDefinition, {VLLMBenchReportGenerationStrategy}) case. This means the vLLM reporter registration is never actually verified.
🧪 Proposed fix — add the missing parametrize entry
             (NixlPerftestTestDefinition, {NIXLKVBenchDummyReport}),
             (AiconfiguratorTestDefinition, {AiconfiguratorReportGenerationStrategy}),
+            (VllmTestDefinition, {VLLMBenchReportGenerationStrategy}),
         ],
This will also require adding the corresponding imports at the top of the file:
from cloudai.workloads.vllm import VLLMBenchReportGenerationStrategy, VllmTestDefinition

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…eval_strategy.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@doc/workloads/vllm.rst`:
- Around line 92-93: The disaggregated TOML snippet is ambiguous because
[extra_env_vars] is shown at top level without context; update the example in
vllm.rst to show a complete TOML block including the top-level keys (e.g., name,
test_template_name, executor/test settings) and explicitly show whether
[extra_env_vars] is a sibling of [cmd_args] or nested under it (for example,
include [cmd_args] with its keys then a separate [extra_env_vars] section), so
readers can unambiguously see the intended section hierarchy and placement of
CUDA_VISIBLE_DEVICES.
- Line 73: Replace the awkward phrase "from less priority to more priority" in
the sentence "The number of GPUs can be controlled using the options below,
listed from less priority to more priority:" with a clearer alternative such as
"from lowest to highest priority" or "in order of increasing priority" so the
sentence reads e.g. "The number of GPUs can be controlled using the options
below, listed from lowest to highest priority:"; update the string where that
sentence appears in the vllm.rst documentation.

In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 47-48: The use of functools.cache on parse_vllm_bench_output
causes indefinite memoization by Path and can return stale results if the file
changes; update the function to either remove the `@cache` decorator or change the
cache key to include the file's modification state (e.g., use an explicit
memoization keyed by (res_file, res_file.stat().st_mtime) or a TTL/lru cache) so
cached entries are invalidated when the file is updated; locate
parse_vllm_bench_output and replace the `@cache` usage with one of these
strategies to ensure fresh results for changed files.
- Around line 53-58: The except clause in the block that opens res_file and
calls VLLMBenchReport.model_validate(data) is redundant because
json.JSONDecodeError is already an Exception; update the except clause from
"except (json.JSONDecodeError, Exception):" to a single "except Exception:" so
it no longer lists duplicate exception types while preserving the current error
handling behavior.

In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 258-268: The script launches the proxy in background (proxy_cmd,
PROXY_PID) and immediately starts the benchmark (bench_cmd), causing potential
failures if the proxy isn't ready; update the generated shell to wait for proxy
readiness by invoking the existing wait_for_health helper (or a short sleep)
against the proxy endpoint after starting the proxy and before running
bench_cmd, ensuring the health check references the same proxy port/URL used by
proxy_cmd and still retains PROXY_PID handling.

In `@tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py`:
- Around line 55-60: The fixture vllm_disagg_tr mutates the shared vllm fixture;
instead create a fresh VllmTestDefinition instance (or deep copy the existing
vllm) inside vllm_disagg_tr, set its extra_env_vars =
{"CUDA_VISIBLE_DEVICES":"0,1,2,3"} and its cmd_args.prefill = VllmArgs() on that
new instance, then pass the new instance to TestRun(test=...) so vllm remains
unchanged; reference the vllm_disagg_tr fixture, VllmTestDefinition (or use
copy.deepcopy(vllm)), TestRun, and VllmArgs when making the change.

doc/workloads/vllm.rst

src/cloudai/workloads/vllm/report_generation_strategy.py

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

greptile-apps · 2026-02-10T13:39:30Z

Greptile Overview

Greptile Summary

This PR adds a new vllm workload to CloudAI with Slurm-only support for single-node aggregated and disaggregated (prefill/decode) modes. It introduces:

A vLLM TestDefinition and argument models (VllmCmdArgs, VllmArgs, VllmBenchCmdArgs) plus a was_run_successful() implementation based on parsing vllm-bench.log.
A VllmSlurmCommandGenStrategy that generates sbatch scripts to start vLLM serve processes, wait on /health, optionally start a proxy in disaggregated mode, then run vllm bench serve and persist vllm-bench.json.
A VLLMBenchReportGenerationStrategy that parses vllm-bench.json and prints a Rich table.
Documentation and test fixtures for sbatch generation.

Main issues to address before merge are in the vLLM Slurm command generation: (1) bench “extra args” are currently wired through model_extra but VllmBenchCmdArgs does not accept extras, so those args will never be emitted; and (2) explicit GPU overrides are only honored when both prefill and decode GPU IDs are set, which breaks partial override configurations.

Confidence Score: 3/5

This PR is close to mergeable, but command generation has a couple correctness issues that will break intended configuration paths.
Core workload wiring and documentation look consistent, but VllmSlurmCommandGenStrategy has at least one definite bug where benchmark extra args cannot be passed through (always empty), and GPU override behavior is unexpectedly strict. Tests for these paths appear inconsistent with the current pydantic model configuration, indicating runtime behavior won’t match expectations.
src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

Important Files Changed

Filename	Overview
doc/workloads/index.rst	Adds vllm workload entry to the workloads documentation table.
doc/workloads/vllm.rst	Adds vLLM workload documentation and examples; no functional code changes.
src/cloudai/registration.py	Registers the new VLLM workload command gen strategy and report strategy in the global registry.
src/cloudai/workloads/vllm/init.py	Exports vLLM workload public API symbols (test definition, strategies, constants).
src/cloudai/workloads/vllm/report_generation_strategy.py	Adds JSON parsing and Rich table report generation for vLLM bench results.
src/cloudai/workloads/vllm/slurm_command_gen_strategy.py	Implements Slurm sbatch generation for aggregated/disaggregated vLLM runs; issues found with bench extra args handling and GPU override logic.
src/cloudai/workloads/vllm/vllm.py	Defines vLLM CmdArgs/TestDefinition and success parsing from bench logs; note: previous review comment about falsy values in serve_args was addressed.
tests/job_status_retrieval_strategy/test_vllm_job_status_retrieval_strategy.py	Adds unit tests for vLLM success detection via bench log markers.
tests/ref_data/vllm-disagg.sbatch	Adds reference Slurm script fixture for disaggregated vLLM mode.
tests/ref_data/vllm.sbatch	Adds reference Slurm script fixture for aggregated vLLM mode.
tests/slurm_command_gen_strategy/test_vllm_slurm_command_gen_strategy.py	Adds tests for vLLM Slurm command generation and GPU selection; includes a test expecting bench extra args that currently won’t work given VllmBenchCmdArgs config.
tests/test_acceptance.py	Extends acceptance sbatch generation coverage to include vllm and vllm-disagg.
tests/test_init.py	Updates registry init tests to include VLLM definitions/strategies and adjusts expected counts.
tests/test_test_scenario.py	Updates reporter registry size expectation; otherwise unchanged behavior.

Sequence Diagram

sequenceDiagram
    participant User as cloudai CLI
    participant Parser as TestScenarioParser
    participant Reg as Registry
    participant Runner as SlurmRunner
    participant CG as VllmSlurmCommandGenStrategy
    participant Slurm as Slurm (sbatch/srun)
    participant VLLM as vLLM serve
    participant Bench as vLLM bench
    participant Rep as VLLMBenchReportGenerationStrategy

    User->>Parser: parse scenario/test TOML
    Parser->>Reg: lookup VllmTestDefinition + strategies
    Reg-->>Parser: VllmTestDefinition, VllmSlurmCommandGenStrategy, VLLMBenchReportGenerationStrategy
    Parser-->>Runner: create TestRun (VllmTestDefinition)
    Runner->>CG: instantiate command gen strategy
    CG->>CG: compute gpu_ids / prefill_gpu_ids / decode_gpu_ids
    CG-->>Runner: gen_exec_command() -> sbatch script path
    Runner->>Slurm: sbatch script (srun steps)
    Slurm->>VLLM: start vllm serve (1x aggregated OR prefill+decode)
    Slurm->>Slurm: wait_for_health curl /health
    Slurm->>Bench: run vllm bench serve
    Bench-->>Slurm: write vllm-bench.log + vllm-bench.json
    Runner->>Rep: generate_report() on output dir
    Rep->>Rep: parse_vllm_bench_output(vllm-bench.json)
    Rep-->>User: print Rich results table

greptile-apps

_{14 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-10T13:39:41Z

Additional Comments (2)

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py
Incorrect argv tokenization

get_vllm_bench_command() is returning items like "--model {cmd_args.model}" and extras like "--extra 1" as single list elements (later " ".join(...)). When this gets executed, flags/values won’t be passed as distinct argv tokens and any value containing spaces (or needing quoting) will be mis-parsed. This will break benchmark invocation for legitimate inputs.

Consider returning [..., "--model", cmd_args.model, "--base-url", f"http://0.0.0.0:{cmd_args.port}", ...] and for extras extras.extend([f"--{k}", str(v)]) so the script can safely join/execute without re-tokenization.

src/cloudai/workloads/vllm/vllm.py
Brittle success parsing

was_run_successful() parses the successful request count with int(line.split()[2]). If vLLM’s output format changes slightly (extra columns, different spacing, etc.), this will throw and you’ll fall through to reporting failure even when results are present (the exception is swallowed and the loop continues). A small regex like r"Successful requests:\s*(\d+)" (or split from the colon) would make this robust and avoid false negatives.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 31-45: VLLMBenchReport defines std_ttft_ms and std_tpot_ms but
they aren't shown in the generated report table; either remove these fields or
add them to the displayed metrics—update the report-generation code that
currently renders mean_ttft_ms/median_ttft_ms/p99_ttft_ms and
mean_tpot_ms/median_tpot_ms/p99_tpot_ms to also include std_ttft_ms and
std_tpot_ms (add headers, column values and formatting consistent with the other
stats), or delete std_ttft_ms/std_tpot_ms from VLLMBenchReport if intentionally
unused; ensure any serialization/deserialization and tests reference the updated
schema.

src/cloudai/workloads/vllm/report_generation_strategy.py

greptile-apps

_{14 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/cloudai/workloads/vllm/report_generation_strategy.py`:
- Around line 64-65: The cache key issue comes from passing potentially
non-normalized Path objects into parse_vllm_bench_output from
can_handle_directory; update can_handle_directory to resolve the path (e.g.,
call self.test_run.output_path.resolve() or resolve() on the
VLLM_BENCH_JSON_FILE path) before passing it to parse_vllm_bench_output so the
cached key is consistent with generate_report and other callers, and likewise
ensure any other call sites (like generate_report) also resolve the path before
invoking parse_vllm_bench_output to avoid inconsistent cache hits/misses.

src/cloudai/workloads/vllm/report_generation_strategy.py

greptile-apps

_{14 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/vllm.py

greptile-apps

_{14 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

greptile-apps · 2026-02-10T16:41:31Z

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

+
+    @property
+    def gpu_ids(self) -> list[int]:
+        cuda_devices = str(self.test_run.test.extra_env_vars.get("CUDA_VISIBLE_DEVICES", ""))
+        if (self.tdef.cmd_args.prefill and self.tdef.cmd_args.prefill.gpu_ids) and self.tdef.cmd_args.decode.gpu_ids:
+            cuda_devices = f"{self.tdef.cmd_args.prefill.gpu_ids},{self.tdef.cmd_args.decode.gpu_ids}"
+        if cuda_devices:
+            return [int(gpu_id) for gpu_id in cuda_devices.split(",")]
+        return list(range(self.system.gpus_per_node or 1))


Disagg GPU override ignored

In gpu_ids (line ~41), the override branch requires self.tdef.cmd_args.prefill and self.tdef.cmd_args.prefill.gpu_ids before using the explicit decode.gpu_ids. This means a user who sets only decode.gpu_ids (or only prefill.gpu_ids) won’t get their explicit selection reflected in gpu_ids, and disaggregated scripts will fall back to splitting CUDA_VISIBLE_DEVICES/system GPUs instead. If explicit prefill.gpu_ids or decode.gpu_ids is set, gpu_ids should be derived from whichever is provided (and validated) rather than requiring both.

Here are options:

No gpu_ids set -> use either system or CUDA_VISIBLE_DEVICES.

IDs are set for both decode and prefill -> ignore env var and system settings and use explicit values from a user.

One of decode.gpu_ids or prefill.gpu_ids is set -> unclear what to use for other part. Potentially, we can implement smart extraction based on input and values from (1). E.g. decode.gpu_ids = "0,1" and total GPUs set to 4 (or via env var), then prefill.gpu_ids should be equal to "2,3".

Only (3) is not done yet. And this logic might be very fragile and lead to unexpected results. Current approach is also questionable, though.

@podkidyshev let's discuss pros and cons here and maybe delay the decision till we get some feedback from the users.

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

amaslenn added 30 commits February 3, 2026 10:19

Add mock vLLM workload

f699535

Add get_vllm_serve_command()

92755ce

Add generate_serve_run_and_wait_block()

2980e23

Initial implementation for _gen_srun_command()

4bdf09c

Add acceptance case

2df1194

Control flow from the sbatch

90b8eaa

Add bench runs

8677a02

Redirect outputs and use --overlap

3f513fa

Update to use docker cache

ea48456

Log steps and less output files

e9080fd

Check if run successfull

1390886

Fix tests

3544898

Prepare for disagg

573a35f

vLLM disagg mode

e57bc77

Add quotation

812c1c5

Add wa for conflicting VLLM_NIXL_SIDE_CHANNEL_PORT

07074d9

Fix port offset value

0ce7b01

Use --export for per-run env vars

cf8cbe1

Support extra args

a55a969

More info on cleanup

ac7e603

Correct list of devices as arg

6052bd7

Better env vars handling

f25e41e

Better check for success

f1ec11d

Update port offset logic

11b00ed

Use ip instead of localhost

dcee8ec

Configure a git repo for proxy script

f85086f

Override prefill/decode gpu list

ce78d85

Updates

bc09add

Configure output files for bench, allow any options

29271b0

Add reporting for vLLM

271df0f

amaslenn added 4 commits February 10, 2026 11:30

Fix typo

a353c06

Install HF model and mount into container

1e85881

Merge branch 'main' into am/vllm

34ad553

Add doc for vLLM

a5bb85a

amaslenn requested review from jeffnvidia and srivatsankrishnan as code owners February 10, 2026 12:39

Fix copyright year

4fd089f

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

amaslenn and others added 4 commits February 10, 2026 13:57

Update doc/workloads/vllm.rst

6c19287

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

f2ef121

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update tests/job_status_retrieval_strategy/test_vllm_job_status_retri…

eafb045

…eval_strategy.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Make parse_vllm_bench_output safer

a81447e

coderabbitai bot mentioned this pull request Feb 10, 2026

Refactor private attributes in Pydantic models to use PrivateAttr #800

Open

Address review comments

ad9e6b5

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

Update doc/workloads/vllm.rst

cd8c093

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

amaslenn requested a review from podkidyshev February 10, 2026 13:35

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

Address review comments

d05b7c7

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

src/cloudai/workloads/vllm/report_generation_strategy.py Show resolved Hide resolved

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

Address review comments

8c71afb

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

src/cloudai/workloads/vllm/report_generation_strategy.py Show resolved Hide resolved

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

src/cloudai/workloads/vllm/vllm.py Show resolved Hide resolved

Address review comments

488d1ee

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

Add support for running vLLM #799

Are you sure you want to change the base?

Add support for running vLLM #799

Uh oh!

Conversation

amaslenn commented Feb 10, 2026

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

amaslenn Feb 10, 2026

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

greptile-apps bot commented Feb 10, 2026 •

edited

Loading